DirichletRank: Ranking Web Pages Against Link Spams

نویسندگان

  • Xuanhui Wang
  • Tao Tao
  • Jian-Tao Sun
  • ChengXiang Zhai
چکیده

Anti-spamming has become one of the most important challenges to web search engines and attracted increasing attention in both industry and academia recently. Since most search engines now use link-based ranking algorithms, link-based spamming has become a major threaten. In this paper, we show that the popular link-based ranking algorithm PageRank, while being successfully used in the Google search engine, has a “zero-one gap” flaw, which can be potentially exploited to spam PageRank results easily. The “zero-one gap” problem arises from the current ad hoc way of computing the transition probabilities in the random surfing model. We propose a novel DirichletRank algorithm in a more principled way of computing these probabilities based on Bayesian estimation with a Dirichlet prior. DirichletRank is a variant of PageRank, but it does not have the problem of “zero-one gap” and is analytically shown to be substantially more resistant to link farm spams than PageRank. Simulation experiments using real web data show that, compared with the original PageRank, DirichletRank is significantly more robust against several typical link spams and is more stable under link perturbations, in general. Moreover, experiment results also show that DirichletRank 1 is more effective than PageRank due to its more reasonable allocation of transition probabilities. Since DirichletRank can be computed as efficiently as PageRank, it is scalable to large-scale web applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study of Page Ranking Algorithms for Information Retrieval

This paper gives an introduction to Web mining, then describes Web Structure mining in detail, and explores the data structure used by the Web. This paper also explores different Page Rank algorithms and compare those algorithms used for Information Retrieval. In Web Mining, the basics of Web mining and the Web mining categories are explained. Different Page Rank based algorithms like PageRank ...

متن کامل

A Study on Ranking Method in Retrieving Web Pages Based on Content and Link Analysis: Combination of Fourier Domain Scoring and PageRank Scoring

Ranking module is an important component of search process which sorts through relevant pages. Since collection of Web pages has additional information inherent in the hyperlink structure of the Web, it can be represented as link score and then combined with the usual information retrieval techniques of content score. In this paper we report our studies about ranking score of Web pages combined...

متن کامل

T-Rank: Time-Aware Authority Ranking

Analyzing the link structure of the web for deriving a page’s authority and implied importance has deeply affected the way information providers create and link content, the ranking in web search engines, and the users’ access behavior. Due to the enormous dynamics of the web, with millions of pages created, updated, deleted, and linked to every day, timeliness of web pages and links is a cruci...

متن کامل

DRANK+: A Directory Based Pagerank Prediction Method for Fast Pagerank Convergence

As the increasing of importance in search engines, Internet users change their behavior browsing the Internet little by little. In recent years, most part of search engines use link analysis algorithms to measure the importance of web pages. They employ the conventional flat web graph constructed by web pages and link relation of web pages to measure the relative importance of web pages. The mo...

متن کامل

Analysis of Link Based Ranking for the Web

In the last years, several techniques based in link analysis have been proposed and used in search engines to rank Web pages. As links are generated by humans, link based ranking seems to give better results than traditional techniques such as vector based ranking. However, no studies have been done about their real impact. In this paper we extend global page ranking techniques to Web site rank...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005